Detecting Collaboration Regions in a Chat Session

نویسندگان

  • Dan Banica
  • Stefan Trausan-Matu
  • Traian Rebedea
چکیده

The paper presents an approach and a software system for the automatic detection of the collaboration regions in a chat session. Although there is no unanimous accepted definition of good collaboration regions, they are generally easy to recognize, as their most important properties are known: they contain replies from more participants, the replies should be on-topic and participants should elaborate together, construct starting from the ideas of others. This is in opposition with the case when participants discuss in parallel, ignoring each other. Perfectly detecting collaboration regions involves understanding the natural language, which is an AI-complete problem, not solvable for the moment. However, we believe that good approximations can be done using some heuristics. In this paper we present a few such techniques, as well as a framework for detecting collaboration regions starting from a Bakhtinian perspective. Introduction Collaborative learning represents an approach of the educational process closely related to the theories of Lev Vygotsky (1978). According to him, social interaction plays a central role in cognition development. Cultural tasks developed by a child originate in relations between individuals. Here a key concept is that of the Zone of Proximal Development (ZPD), which represents the range of the tasks that somebody is able to achieve when being assisted, but not independently. Vygotsky claims that the role of education is to offer children experiences that are in their ZPD, this way extending the area of independently solvable tasks. Computer Supported Collaborative Learning (CSCL) aims to facilitate interactions between students, or between tutors and students using computers. Numerous tools allow users located far away to communicate, one of the most representatives being the chat systems (Stahl, 2006). In this paper we present an approach for analyzing a chat discussion and identifying regions with a good collaboration. Such regions occur when more participants are involved, discuss on-topic and elaborate together, as opposed to the case when they ignore each other, each exposing own ideas. The corpus used for analysis was developed at the University “Politehnica” of Bucharest which consists of chats held in the VMT (Stahl, 2009) environment which has an important advantage – it allows participants to specify to what reply they are answering, using explicit links. This is done by clicking another utterance before submitting a reply. The scenario assigned for the chats is the following: each participant must choose a collaborative technology (chat, blog, wiki and forum) and in the first part of the talk he must try to convince the others that his technology is the best. In the second part, the learners must try to reach a consensus discussing how they could integrate the technologies in order to get the best usage scenario in a company. This paper is structured as follows: section 2 presents the theoretical framework and the main algorithms used by the system. Section 3 describes the heuristics that can be applied in order to estimate the collaboration in a chat, while section 4 presents several results. We end the paper with conclusions and future work. Theoretical Framework As a starting point, we used one of Michael Bakhtin’s ideas: “Utterances are not indifferent to one another, and are not self-sufficient; they are aware of and mutually reflect one another. These mutual reflections determine their character. Each utterance is filled with echoes and reverberations of other utterances to which it is related by the communality of the sphere of speech communication. Every utterance must be regarded primarily as a response to preceding utterances of the given sphere (we understand the word <> here in the broadest sense). Each utterance refutes, affirms, supplements, and relies on the others, pre-supposes them to be known, and somehow takes them into account” (Bakhtin, 1986). Although there is quite obvious that trying to implement a computer program starting from this theory will lead to some simplifications of Bakhtin’s ideas, it is still very useful because it offers all the time a perspective from which to investigate the conversation. According to Bakhtin, each utterance adds some aspects to the discussed topics and in the same time takes into account the aspects revealed by previous utterances. The extent to which an utterance is based on another varies: it can be an explicit answer, or it can contain only an “echo” of the previous one. Simply because the author is aware of the previous utterance indicates an existing link. From this point of view, between any two utterances there is some degree of collaboration, lower or higher. In our system we modeled this as a complete, weighted graph, utterances being the nodes and the weight of an edge being the degree of collaboration between them. We will call this the collaboration graph. This degree of collaboration (weights of the edges) is estimated using some heuristics. A zone (region) of the chat is considered a set of at least two nodes corresponding to consecutive utterances in the chat. The terms zone and region shall be used interchangeably. The total collaboration of a zone is defined as the sum of the individual collaborations formed between utterances inside that zone. Figure 1 below illustrates this notion. The total collaboration of region S is the sum of weights associated to edges that are completely contained in the rectangle. Although the graph of collaboration is a complete graph, for simplicity, in the figure we did not draw all the edges. Note that the total collaboration is not a measure of how good a collaboration region is. A long region will finally accumulate a large total collaboration, without necessary being a good collaboration zone in the sense we want. To solve this difficulty we defined the notion of attenuated collaboration of a zone or simply collaboration of a zone, which is the total collaboration divided by some function which increases with the zone’s length. A zone with a good collaboration can be defined as a zone having collaboration above a threshold. Figure 1. Total collaboration of a region. So far we have defined some notions with the purpose of quantifying a region with a good collaboration, starting from collaborations between pairs of utterances. We can at this moment design an algorithm for detecting the zones with good collaboration. First, we will assume that the weights in the graph are already known, reducing the problem to one involving graphs only. In section 3 we will see methods for actually computing these weights (i.e. estimating collaboration between pairs of utterances). We analyze the possibility of computing the collaboration for all zones in a chat. This involves computing values, where n is the number of utterances, because any two utterances bi-univocally determine a region: the region that is starting from the first utterances in the pair and ends at the second one. Recall that a zone must contain at least two utterances. All these values can be efficiently computed using an algorithm that has a low computational complexity. The key in devising this algorithm is to first compute the total collaboration for each region, pre-computing the values . Note that is actually the contribution in terms of total collaboration that utterance q brings to the region starting from utterance p ending at utterance q-1, when this region is being extended to also include utterance q. Now, that we have the collaboration associated with each zone of the chat, one more aspect remains to be solved. We must select a set of zones in order to present them as high collaboration regions. Simply presenting the zones with collaboration higher than some threshold elicits the problem that overlapping regions will appear. Indeed, starting from a high collaboration zone and removing the last utterance will probably lead to another zone with high collaboration. However, we do not want both these regions to appear in a selection of high collaboration zones. We have used a greedy-type solution, i.e. at each step we choose the best collaboration region that have not been yet chosen or rejected and then reject all other regions that overlap this one. We say that two regions overlap if they have at least one common utterance. In order to implement these ideas, we first sort the regions according to collaboration values, and then we go through them starting from the one with the highest value, for each region checking whether it intersects any of the previous selected ones. Typical values of k, the number of selected regions are in the range 10-20. In our corpus chats generally had below 400 utterances. For such values analyzing a chat is almost instant on any computer. The usage of this greedy approach for selecting regions is not only computationally favorable. We claim that it also makes a good selection of regions. To gain some insight into this, consider the case illustrated in Figure 2. Suppose region S has a higher collaboration than each of the regions T1, T2,... Tn which are also good collaboration regions (with collaboration above a threshold) that intersect S. Choosing S instead of T1, T2, .. Tn seems a good alternative, because while S is the zone with the highest collaboration, all the Ti regions are probably good collaboration zones just because they share utterances with S. Figure 2. Example of using the greedy algorithm to select collaboration regions. Heuristics for Estimating Collaboration This section discusses the second part necessary for implementing the automatic detection of collaboration regions. We assumed so far that we know the collaboration score between individual utterances (the weight of the edges in the collaboration graph), and the selection algorithms without worrying about them. In this section, we will present some features used for estimating these collaboration scores. The first feature taken into account is represented by the explicit links. During a discussion in the VMT environment, participants can specify before sending a reply that it is a response to a previous one. Therefore, the fact that an utterance U2 has an explicit link to another one U1 suggests a powerful collaboration between these two replies. Furthermore, if U1 contains an explicit link to another utterance U0, then this link, besides the collaboration between U1 and U0, also indicates some collaboration (however, a weaker one) between U2 and U0. In our implementation, starting from an utterance U, we go back on the chain of explicit links and associate smaller and smaller collaborations between U and utterances found. Explicit links have some problems that must be treated by other heuristics. Most important, the majority of chat systems don’t offer this facility, and even when they do, it is not always used by the participants. If at a given moment of the conversation multiple parallel threads exist, these threads will never be joined by explicit links. It would be useful to determine whether these threads are really independent, or whether they just discuss alternative aspects of the same subject. In the second case, threads are somehow related and overall, the zone should have a better collaboration than it would have when threads were discussing distinct topics. When explicit links are missing or just in order to adjust the values offered by them we can use another criterion which should detect links between utterances: common concepts. If the same word appears in two different, but nearby replies, then probably there is some link between the two utterances. We don’t restrict the search to exactly the same word: if two words have the same stem or are synonyms, they are also considered the same concept, increasing thereby the connection between the utterances. There can be some zones into a discussion where participants collaborate but outside of the desired topics. These zones are not of interest to us as they are off-topic. In order to increase the bonus for on-topic collaboration, a first criterion consists of using a list of keywords. If an utterance contains some of these words, it probably is on topic, therefore the values of the collaboration zones which contain it are increased. Another feature used to evaluate the degree of collaboration is represented by speech acts and argument models. A speech act represents a function that an utterance possesses in a conversation. Some examples of speech acts are greeting, request, complaint, invitation, etc. Generally, utterances that belong to a certain speech act might be formed by a single word or might be arbitrary long. However, for many speech acts some patterns can be identified in which the corresponding utterances fit in. Using a pattern matching mechanism, a module of the LTfLL project (www.ltfll-project.org) tries to detect the speech act of each utterance. In the same way are identified elements that belong to the process of argumentation. These elements correspond with small variations to the model introduced by Stephen Toulmin and are the following: claim, ground, qualifier, rebuttal and concession (Toulmin, 1958). We can use such information in our program. For example, an utterance that is a concession probably is involved in a high collaboration with another reply. On the other side, a greeting is not interesting for us, so although the participant might have used an explicit link when greeting another, the collaboration involved should not be taken into account (as we only want to consider on-topic collaboration). Results and Heuristics Evaluation A first important result is obtained using only explicit links for evaluating the collaboration. A zone obtained using this method is shown in Figure 3. Analyzing the graph on the right of the figure (the nodes represent utterances and edges represent explicit links), it can be seen that we have a good edge-density. Although we have shown earlier a series of problems that explicit links can have, still if participants make intensive use of the facility, this criterion alone estimates good collaboration regions and the obtained results are promising. As stated before, an important problem is that not always participants use this option. We have chosen one of the chats from our corpus that contains many explicit links (in this discussion over 75% of the utterances use this facility). We computed the collaboration of every possible region twice: once using only the explicit-links heuristics, once using all the other heuristics. Between these two sets of values the Pearson correlation coefficient was computed. If this coefficient is close to 1, then the results obtained using the second criterion are in concordance with those obtained using only the explicit links. Several alternative methods were explored in order to discover the most efficient one, as shown below. Variant 1: We have used only a restricted version of the criterion “common concepts” defined above, giving a bonus if two replies have a common word – not taking into account neither the distance between the utterances, nor inflationary forms of the same word. This way we obtained a correlation coefficient of 0.032. This value doesn’t indicate the existence of any similarity relative to the values obtained by using only the explicit links. Variant 2 improves the criterion of common concepts by taking into account the distance between utterances. The intuition is that although two utterances share a common word, if they are far from each other, they are probably not very much related. In order to implement this in our program, the constant bonus accorded for each common word is divided by the distance between the two utterances. This way we have a significant increase in the correlation, which boosted it up to 0.67. Variant 3: we continue the improvement starting from variant 2, but also by using the frequency of the repeated word. In consequence, when a common word is found, the bonus will depend not only on the distance between the two words, but also on the frequency of the word: the more frequent it is in our chat, the less we increase the corresponding value of the collaboration. Thus, we obtained another important increase in the Pearson correlation that was 0.79. Variant 4: continuing from variant 3, but allowing the words have the same stem we obtained a small improvement, correlation = 0.80. For this example, it seems that stemming does not provide any relevant improvement, but further tests should be undertaken in order to see more precisely how useful is this feature. Variant 5: starting from variant 4 and taking into account some speech acts and argumentation elements, (e.g.: increasing the collaborations that involve a “concession”), we obtained this way a correlation of 0.82. Again, this is a small increase, and the precise effect must be further analyzed. However, overall we obtained promising results, indicating that what can be obtained using the explicit link structure (which possesses a lot of semantic information) can also be obtained using these kind of heuristics. Figure 3. A good collaboration zone obtained using only explicit links. In Figure 3 is presented a screenshot of the system that has been developed in order to have a better visualization of the zones with a high collaboration in a chat conversation. In the upper-left is the chat log. The first number (in square brackets) is the identifier of the utterance, while the second number is the id of the explicitly referred utterance. Each participant has a distinct color and the names may be automatically anonymised. On the right of the image is displayed the graph of utterances, the edges representing explicit links, as mentioned before. Below the conversation are listed the zones of good collaboration that were detected and their associated values. The user has the possibility to manually edit the collaboration zones and to visualize discussion threads by clicking on a node in the graph on the right part of the figure. A verification of the developed system was done by comparing its results with those provided by the PolyCAFe system (Trausan-Matu and Rebedea, 2010, Dascalu, Rebedea and Trausan, 2010) and those manually identified by other persons than the developer. The results of the system were very similar with those of the human and better than those of PolyCAFe. Conclusions and Future Work We implemented a system for automatically identifying the zones of a chat with a good collaboration. First we created a theoretical framework, being guided by an idea stated by the Russian philosopher Mikhail Bakhtin. We considered that between any two utterances there is some degree of collaboration. Starting from this point we derived a few notions leading us to defining what a good collaboration region is and implemented several algorithms that allow us to efficiently extract these regions. Besides this theoretic framework, we also devised a couple of heuristics that estimate the collaboration between a pair of utterances. In this paper we described and analyzed these heuristics. We have shown that the explicit links which are possessing semantic information can be approximated using some heuristics that are based on a rather lexical analysis. These initial results are promising and future research includes testing new heuristics, for example some based on Social NetworkAnalysis (perhaps collaboration appears more between persons with a similar rank). Another improvementwould be to increase the number of heuristics by taking into account more powerful semantic similaritymeasures that would extend the lexical and WordNet based ones defined in the paper: to this extent, LSA andmore powerful semantic distances or lexical chains could be used. ReferencesBakhtin, M. (1986). Speech Genres and Other Late Essays. University of Texas, Austin.Dascalu, M., Rebedea, T. & Trausan-Matu, S. (2010). A Deep Insight in Chat Analysis: Collaboration,Evolution and Evaluation, Summarization and Search, AIMSA 2010, LNAI 6304, Springer, 191-200.Stahl, G.: (2006). Group Cognition: Computer Support for Building Collaborative Knowledge, MIT Press.Stahl., G. (Ed.). (2009) Studying Virtual Math Teams, Springer, Boston.Toulmin, S., (1958). The Uses of Arguments. Cambridge Univ. Press.Trausan-Matu, S., Dessus, P., Rebedea, T., Mandin, S., Villiot-Leclercq, E., Dascalu, M., Gartner, A., Chiru, C.,Banica, D., Mihaila, D., Lemaire, B., Zampa, V., Graziani, E., Learning Support and Feedback, LTfLLDeliverable 5.2, http://ltfll-project.org/tl_files/LTfLL-Deliverables/LTfLL_D5.2.pdf, downloaded atNovember 7, 2010Trausan-Matu, S., & Rebedea, T. (2010). A Polyphonic Model and System for Inter-animation Analysis in ChatConversations with Multiple Participants. In A. Gelbukh (Ed.), Computational Linguistics andIntelligent Text Processing (Vol. 6008, pp. 354-363): Springer Berlin / Heidelberg.Vygotsky, L. (1978). Mind in society. Cambridge, MA: Harvard University Press. AcknowledgmentsThe research presented in this paper was partially performed under the FP7 EU STREP project LTfLL(http://www.ltfll-project.org/). The authors would like to thank all the students and tutors at the University“Politehnica” of Bucharest that were involved in the validation experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IMPACT OF SYNCHRONOUS COMPUTER-MEDIATED COMMUNICATION ON EFL LEARNERS’ COLLABORATION: A QUANTITATIVE ANALYSIS

For the last two decades, computers have entered people’s lives in an unprecedented manner in a way that almost everybody considers life without them rather impossible. In recent years, researchers and educators have been trying to discover how computers and the Internet technology can maximize the quality of language instruction. As such, the present experimental study sought to investigate th...

متن کامل

Social Co-Browsing in Online Shopping: the Impact of Real-Time Collaboration on User engagement

For many years, online shopping has been a solitary activity. Social co-browsing is an emerging technology which enables two or more users to share the same view in the browser in real-time. Cobrowsing enhances online shopping sites with a multi-player mode and enables new forms of communication between online shoppers. In this paper, we study the impact of social presence in social co-browsing...

متن کامل

Dialogism: A Framework for CSCL and a Signature of Collaboration

As Computer Supported Collaborative Learning (CSCL) gains a broader usage in multiple educational scenarios facilitated by the use of technology, the need for automated tools capable of detecting and stimulating collaboration increases. We propose a computational model using ReaderBench that assesses collaboration from a dialogical perspective. Accordingly, collaboration emerges from the inter-...

متن کامل

A Proxy-Based Infrastructure for Web Application Sharing and Remote Collaboration on Web Pages

When people collaborate remotely, the WWW is part of the shared resources they use together. However, web pages do not offer support for collaborative interaction such as viewing or influencing another user’s browsing session – additional software needs to be installed for these features. In this paper, we present UsaProxy 2, an HTTP proxy that allows the same web page or application to be view...

متن کامل

Repeatability of Detecting Visual Cortex Activity in Functional Magnetic Resonance Imaging

Introduction As functional magnetic resonance imaging (fMRI) is too expensive and time consuming, its frequent implementation is difficult. The aim of this study is to evaluate repeatability of detecting visual cortex activity in fMRI. Materials and Methods In this study, 15 normal volunteers (10 female, 5 male; Mean age±SD: 24.7±3.8 years) attended. Functional magnetic resonance images were ob...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011